home *** CD-ROM | disk | FTP | other *** search
-
-
-
- ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM)))) ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM))))
-
-
-
- NNNNAAAAMMMMEEEE
- failover - disk device alternate path support
-
- SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
- ////eeeettttcccc////iiiinnnniiiitttt....dddd////ffffaaaaiiiilllloooovvvveeeerrrr [[[[iiiinnnniiiitttt||||ssssttttaaaarrrrtttt]]]]
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- Failover creates an infrastructure for the definition and management of
- multiple paths to a single disk device or lun. This failover
- infrastructure is used by an SGI logical volume manager (XLV, XVM) to
- select the path used for access to the logical volume(s) created on the
- storage device(s). In the presense of i/o errors, the SGI logical volume
- manager will request from the failover infrastructure a new path to be
- used for access to the erring logical volumes. This path failover
- requires the logical volume manager's plexing software.
-
- Failover is only possible for devices which utilize _d_k_s_c(_7_m), SGI's scsi
- disk driver.
-
- Failover is not a multi-path load balancing driver.
-
- During system startup, failover automatically detects and configures
- alternate paths (failover groups) to SGI Clariion RAID, SGI TP9100 RAID,
- and SGI TP9400 RAID. To specify a primary path to an SGI RAID, or to
- configure primary and alternate paths to other more generic devices,
- failover also processes configuration directives contained within the
- /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f configuration file which allow manual specification of
- a failover group.
-
- Failover uses /_s_b_i_n/_f_o_c_o_n_f_i_g to parse the configuration file and direct
- the creation of failover groups and the specification of primary paths
- for SGI RAID. /_s_b_i_n/_f_o_c_o_n_f_i_g should not be executed directly.
-
- AAAAlllltttteeeerrrrnnnnaaaatttteeee PPPPaaaatttthhhh CCCCoooonnnnffffiiiigggguuuurrrraaaattttiiiioooonnnn
- Primary and alternate paths to devices are defined by two different
- mechanisms. Automatic detection, and manual configuration via a
- configuration file.
-
- Detection of paths to SGI RAID devices is automatic and happens at the
- time of device discovery during the probing of the scsi and fibre channel
- buses. The detected paths to the SGI RAID together make up a failover
- group. Any path within a failover group can be used for I/O requests
- unless explicit primary path configuration is used (see "Using Manual
- Configuration with SGI RAID" below).
-
- Specification of a primary path to an SGI RAID or configuration of other
- disk storage devices into failover groups is declared within the
- /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f configuration file. This file is processed during
- failover startup, and when the /_e_t_c/_i_n_i_t._d/_f_a_i_l_o_v_e_r script is executed.
- When /_e_t_c/_i_n_i_t._d/_f_a_i_l_o_v_e_r is executed with the ssssttttaaaarrrrtttt parameter, it
- automatically calls _x_l_v__a_s_s_e_m_b_l_e(_1_m). When executed with the iiiinnnniiiitttt
- parameter, the execution of _x_l_v__a_s_s_e_m_b_l_e is skipped.
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM)))) ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM))))
-
-
-
- An entry within /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f which defines a failover group
- consists of a single line, or multiple lines, all except the last ending
- in a \ (backslash). An entry consists of an arbitrary group name, a
- primary path, and optionally up to fifteen alternate paths. The group
- name is an arbitrary string of up to 31 characters. Following the group
- name are the /_d_e_v/_s_c_s_i names associated with the primary and alternate
- paths, the primary being the first path specified.
-
- With manual configuration of failover groups, only the specified primary
- path can be used for I/O requests. This is also the case if the
- configuration file is used to explicitly specify a primary path to an SGI
- RAID.
-
- UUUUssssiiiinnnngggg MMMMaaaannnnuuuuaaaallll CCCCoooonnnnffffiiiigggguuuurrrraaaattttiiiioooonnnn wwwwiiiitttthhhh SSSSGGGGIIII RRRRAAAAIIIIDDDD
- SGI RAID devices can use the /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f configuration file to
- explicitly specify primary paths, rather than letting a volume manager
- pick one. This is useful, because if multiple controllers can each
- access the same storage (in a SAN environment), volume managers will tend
- to use a single controller to access all storage connected to a given
- storage network, precluding using different host adapters to access
- different devices on the storage network.
-
- Specifying a primary path allows the administrator to choose different
- host adapters to access different storage devices, because the volume
- manager will not be able to access storage through the alternate paths.
- This is particularly useful when striping. Only the primary path needs
- to be specified in the /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f file with this option.
- Alternate paths will be automatically detected.
-
- Using manual configuration is recommended with the SGI TP9100 RAID as
- performance to a lun is significantly reduced if both raid controllers
- are utilized to access the lun.
-
- CCCCoooonnnnffffiiiigggguuuurrrraaaattttiiiioooonnnn FFFFiiiilllleeee DDDDiiiirrrreeeeccccttttiiiivvvveeeessss
- Two configuration directives are available for use within the
- /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f configuration file. These directives, #_v_e_r_b_o_s_e and
- #_d_i_s_a_b_l_e__t_a_r_g_e_t__l_u_n__c_h_e_c_k modify the behavior of the /_s_b_i_n/_f_o_c_o_n_f_i_g
- program used to parse the configuration file. They must be placed at the
- beginning of a line within the configuration file and effect all lines
- following the directive. Once enabled, these options cannot be disabled.
-
- #_v_e_r_b_o_s_e causes the program to emit debugging information.
-
- #_d_i_s_a_b_l_e__t_a_r_g_e_t__l_u_n__c_h_e_c_k permits the definition of a failover group
- containing disks or luns with differing target and lun numbers.
-
- SSSSaaaammmmpppplllleeee CCCCoooonnnnffffiiiigggguuuurrrraaaattttiiiioooonnnn EEEEnnnnttttrrrriiiieeeessss
- The sample file shows failover groups, each consisting of a primary path
- and one or more alternate paths.
-
-
-
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
- ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM)))) ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM))))
-
-
-
- #ident $Revision: 1.9 $
- #
- # This is the configuration file for table driven failover support.
- #
- # Please see the failover (7m) manual page for details on
- # how to use this file.
- #
- A sc7d1l0 sc8d1l0
- B sc7d1l1 sc8d1l1
- C sc7d1l2 sc8d1l2
- D sc7d1l3 sc8d1l3
- E sc7d1l4 sc8d1l4
- F sc7d1l5 sc8d1l5
- G sc7d1l6 sc8d1l6
- H sc7d1l7 sc8d1l7
- I 2000002037003be2/lun0/c3p1 2000002037003be2/lun0/c5p2
- J 2000002037003c6c/lun0/c5p2 2000002037003c6c/lun0/c3p1
-
- lun16 2000006016fe0cc0/lun16/c104p0 2000006016fe0cc0/lun16/c108p0 \
- 2000006016fe0cc0/lun16/c110p0 2000006016fe0cc0/lun16/c109p0 \
- 2000006016fe0cc0/lun16/c107p0 2000006016fe0cc0/lun16/c106p0 \
- 2000006016fe0cc0/lun16/c105p0 2000006016fe0cc0/lun16/c103p0
-
- # Cause program to emit debugging information for the following
- # groups.
- #verbose
- # specify a primary path
- priA sc14d11l0
- priB sc15d11l1
-
- # Cause program to ignore target and lun numbering for these raid luns.
- #disable_target_lun_check
- raid1 sc16d10l0 sc17d11l0 sc18d12l0 sc19d13l0
-
-
- SSSSwwwwiiiittttcccchhhhiiiinnnngggg ttttoooo aaaannnn AAAAlllltttteeeerrrrnnnnaaaatttteeee PPPPaaaatttthhhh
- Failover to an alternate path is controlled by an SGI logical volume
- manager (XLV, XVM) and its plexing software. When the logical volume
- manager receives notification of an i/o error, it requests failover to
- switch the erring device to an available alternate path. If the path
- switch is successful, the SGI logical volume manager retries the failed
- i/o using the new path.
-
- The _s_c_s_i_f_o(1m) command is available to permit the system administrator to
- manually request a switch to an alternate path. While the scsifo command
- performs a switch, it is not detected by the SGI logical volume manager
- until the SGI logical volume manager receives an i/o error on the current
- path due to the path no longer being available. The SGI logical volume
- manager then begins utilizing the new path.
-
-
-
-
-
-
- PPPPaaaaggggeeee 3333
-
-
-
-
-
-
- ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM)))) ffffaaaaiiiilllloooovvvveeeerrrr((((7777MMMM))))
-
-
-
- IIIInnnnvvvveeeennnnttttoooorrrryyyy DDDDiiiissssppppllllaaaayyyy
- The _h_i_n_v(1m) command will display the path status of primary and
- alternate paths configured in the /etc/failover.conf configuration file.
- The following sample _h_i_n_v output reflects the above sample configuration
- file. Three of the devices have failed over to the alternate path,
- perhaps via the _s_c_s_i_f_o command.
-
- Integral SCSI controller 7: Version Fibre Channel AIC-1160, revision 1
- Disk drive: unit 1 on SCSI controller 7 (primary path)
- Disk drive: unit 1,lun 1, on SCSI controller 7 (primary path)
- Disk drive: unit 1,lun 2, on SCSI controller 7 (primary path)
- Disk drive: unit 1,lun 3, on SCSI controller 7 (primary path)
- Disk drive: unit 1,lun 4, on SCSI controller 7 (primary path)
- Disk drive: unit 1,lun 5, on SCSI controller 7 (alternate path) DOWN
- Disk drive: unit 1,lun 6, on SCSI controller 7 (alternate path) DOWN
- Disk drive: unit 1,lun 7, on SCSI controller 7 (alternate path) DOWN
- Integral SCSI controller 8: Version Fibre Channel AIC-1160, revision 1
- Disk drive: unit 1 on SCSI controller 8 (primary path)
- Disk drive: unit 1,lun 1, on SCSI controller 8 (alternate path)
- Disk drive: unit 1,lun 2, on SCSI controller 8 (alternate path)
- Disk drive: unit 1,lun 3, on SCSI controller 8 (alternate path)
- Disk drive: unit 1,lun 4, on SCSI controller 8 (alternate path)
- Disk drive: unit 1,lun 5, on SCSI controller 8 (primary path)
- Disk drive: unit 1,lun 6, on SCSI controller 8 (primary path)
- Disk drive: unit 1,lun 7, on SCSI controller 8 (primary path)
- Integral SCSI controller 3: Version Fibre Channel QL2200
- Fabric Disk: node 2000002037003be2 port 1 lun 0 on SCSI controller 3 (primary path)
- Fabric Disk: node 2000002037003c6c port 1 lun 0 on SCSI controller 3 (alternate path)
- Integral SCSI controller 5: Version Fibre Channel QL2200
- Fabric Disk: node 2000002037003be2 port 2 lun 0 on SCSI controller 5 (alternate path)
- Fabric Disk: node 2000002037003c6c port 2 lun 0 on SCSI controller 5 (primary path)
-
- By using the _s_c_s_i_h_a(1m) command to reprobe the bus to which a down device
- is connected, presuming the device is now responding on the bus, the
- "DOWN" indicator displayed by _h_i_n_v can be cleared.
-
- FFFFIIIILLLLEEEESSSS
- /etc/failover.conf
- /etc/init.d/failover
- /etc/init.d/xlv
- /var/sysgen/master.d/failover
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
- autoconfig(1m), dks(5m), ds(7m), hinv(1m), ioconfig(1m), scsifo(1m),
- scsiha(1m), xlv_assemble(1m), and xlv(7m).
-
- NNNNOOOOTTTTEEEESSSS
- The group name specified within the /_e_t_c/_f_a_i_l_o_v_e_r._c_o_n_f file has no
- external visibility. It cannot be correlated to the group number
- information displayed by the _s_c_s_i_f_o command.
-
-
-
-
-
- PPPPaaaaggggeeee 4444
-
-
-
-